Multi-omics signatures in new-onset diabetes predict metabolic response to dietary inulin: findings from an observational study followed by an interventional trial

Aim The metabolic performance of the gut microbiota contributes to the onset of type 2 diabetes. However, targeted dietary interventions are limited by the highly variable inter-individual response. We hypothesized (1) that the composition of the complex gut microbiome and metabolome (MIME) differ across metabolic spectra (lean-obese-diabetes); (2) that specific MIME patterns could explain the differential responses to dietary inulin; and (3) that the response can be predicted based on baseline MIME signature and clinical characteristics. Method Forty-nine patients with newly diagnosed pre/diabetes (DM), 66 metabolically healthy overweight/obese (OB), and 32 healthy lean (LH) volunteers were compared in a cross-sectional case-control study integrating clinical variables, dietary intake, gut microbiome, and fecal/serum metabolomes (16 S rRNA sequencing, metabolomics profiling). Subsequently, 27 DM were recruited for a predictive study: 3 months of dietary inulin (10 g/day) intervention. Results MIME composition was different between groups. While the DM and LH groups represented opposite poles of the abundance spectrum, OB was closer to DM. Inulin supplementation was associated with an overall improvement in glycemic indices, though the response was very variable, with a shift in microbiome composition toward a more favorable profile and increased serum butyric and propionic acid concentrations. The improved glycemic outcomes of inulin treatment were dependent on better baseline glycemic status and variables related to the gut microbiota, including the abundance of certain bacterial taxa (i.e., Blautia, Eubacterium halii group, Lachnoclostridium, Ruminiclostridium, Dialister, or Phascolarctobacterium), serum concentrations of branched-chain amino acid derivatives and asparagine, and fecal concentrations of indole and several other volatile organic compounds. Conclusion We demonstrated that obesity is a stronger determinant of different MIME patterns than impaired glucose metabolism. The large inter-individual variability in the metabolic effects of dietary inulin was explained by differences in baseline glycemic status and MIME signatures. These could be further validated to personalize nutritional interventions in patients with newly diagnosed diabetes.

Peripheral venous blood sample was drawn from each subject after 12 hours of fasting. Parameters of glucose homeostasis (fasting plasma glucose, glycated hemoglobin (HBA1c), C-peptide and insulin) and lipid profile (total cholesterol, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol and triacylglycerides) were assessed in a certified hospital laboratory. Serum zonulin was detected using Human Zonulin ELISA Kit (Elabscience).

Insulin sensitivity and secretion
Insulin sensitivity and secretion were evaluated using data from oral glucose tolerance test (OGGT). OGTT (75g glucose) was performed after 12-hour fasting according to WHO recommendation. First, baseline blood samples were drawn, than the sampling was done 30-minute intervals for two hours yielding 5 values for each subject. Incremental AUCs for glucose and insulin were calculated using trapezoid rule. Insulin sensitivity alone was expressed as Matsuda Index (MI) as published [5].

Sample manipulation and storage
Stool collected at home was immediately stored at −20°C until transported in the frozen state to the laboratory. Once thawed, the four fold of water was added to the sample (up to 10 g), and samples were homogenized using stomacher (BioPro, CR). Immediately after homogenization, an aliquot (600 ul) was taken for DNA analysis. pH was determined in the rest of the sample and the homogenate was sonicated for 1 minute at the maximal amplitude and cycles (UP200S, Heischler Ultrasound Technology). Sonicated samples were used for dry mass estimation and aliquoted and stored at -50 °C until metabolome analysis.
Blood samples were drawn from median cubital vein into Vacutainer tube. For serum, the blood was left standing on the bench for 30 min to clot and then separated by centrifugation. For plasma, the blood was collected into Vacutainer with the anticoagulant, immediately mixed by gently inverting the tube five times and then separated by centrifugation. Parameters of glucose homeostasis were measured in a certified hospital laboratory: fasting plasma glucose using the hexokinase reaction (KONELAB, Dreieich, Germany); C-peptide by using solid-phase competitive chemiluminescent enzyme immunoassay (Immulite 2000, Los Angeles, CA, USA); HbA1c by using high-pressure liquid boronate affinity chromatography (Primus Corporation, Kansas city, MO, USA); and insulin using solid-phase competitive chemiluminescent enzyme immunoassay (Immulite 2000). For the lipid profile, we measured total cholesterol and triglycerides using an enzymatic method kit (KONELAB); high-density lipoprotein-cholesterol measured using a polyethylene glycol-modified enzymatic assay kit (ROCHE, Basel, Switzerland); and low-density lipoprotein-cholesterol calculated using the standard Friedewald equation.

Gut microbiome analysis Fecal Sample Collection and Bacterial DNA Extraction.
Stool collected at home was immediately stored at -20°C until transported in the frozen state to the laboratory. Until isolation, samples were stored at -50 °C. For DNA isolation, 200-250 mg of stool was cut on dry ice. DNA was isolated by QIAmp PowerFecal DNA Kit (Qiagen), according to manufacturer recommendation.

16S rRNA gene Library Preparation and Sequencing
Quality of DNA was determined using gel electrophoresis and concentration was assessed spectrophotometrically using microplate reader (Synergy Mx, BioTek, USA). For identification of bacteria presented in samples, the sequencing of 16S rRNA gene was performed. Extracted DNA was used as a template in amplicon PCR to target the hypervariable region V4 of the bacterial 16S rRNA. The library was prepared according to the Illumina 16S Metagenomic sequencing Library Preparation protocol with some deviations described below (1). The total reaction volume of PCR was 30 µl consisting of 15 μl Q5 HighFidelity 2x MM (BioLabs, New England), 1.5 μl of each 10 μM primer, 9 μl of PCR water and and 3 μl of template. The cycling parameters included initial denaturation at 98 °C for 30 s, followed by 30 cycles of 10 s denaturation at 98 °C, 15 s annealing at 55 °C and 30 s extension at 72 °C, followed by final extension at 72 °C for 2 min. The primer pair consisting of Illumina overhang nucleotide sequences, an inner tag and gene-specific sequences. The Illumina overhang served to ligate the Illumina index and adapter. Each inner tag, i.e. a unique sequence of 7-9 bp, was designed to differentiate samples into groups. The amplified PCR products were determined by gel electrophoresis. PCR clean-up was performed with SPRIselect beads (Beckman Coulter Genomics). Samples with different inner tags were equimolarly pooled based on fluorometrically measured concentration using Qubit® dsDNA HS Assay Kit (Invitrogen™, USA) and microplate reader (Synergy Mx, BioTek, USA). Pools were used as a template for a second PCR with Nextera XT indexes (Illumina, USA). Differently indexed samples were checked and quantified using the three methods: qPCR using LightCycler 480 Instrument (Roche, USA) and KAPA Library Quantification Complete Kit (Roche, USA); 2100 Bioanalyzer Instrument using the High Sensitivity D1000 ScreenTape (Agilent Technologies, USA) and microplate reader (Synergy Mx, BioTek, USA) Qubit® using dsDNA HS Assay Kit (Invitrogen™, USA). Samples were equimolarly pooled according to the measured concentration. The prepared library was checked with the same methods and concentration was measured shortly prior sequencing. The final library was diluted to a concentration of 8 pM and 20 % of PhiX DNA (Illumina, USA) was added. Sequencing was performed with the Miseq reagent kit V2 using a MiSeq instrument according to the manufacturer's instructions (Illumina, USA).

Data processing
Paired reads from 16s rRNA sequencing were first processed using an in-house pipeline implemented in Python 3. Steps of processing included trimming of low-quality 3' ends of reads, removal of read pairs containing unspecified base N and removal of pairs containing very short reads. In order to minimize sequencing and PCR-derived error, forward and reverse reads were denoised using the DADA2 amplicon denoising R package (2). Following denoising, the forward and reverse reads were joined into a single longer read using the fastq-join read joining utility (3). In order to be joined, reads in pairs had to have an overlap of at least 20 base pairs with no mismatches allowed. Pairs in which this was not the case were discarded. As the final step, chimeric sequences were removed from the joined reads using the remove Bimera function of the DADA2 R package. Subsequent taxonomic assignment was conducted by the uclustconsensus method from the QIIME (4) microbial analysis framework using the Silva v. 123 (5) reference database. In cross-sectional study, we found 44,332 ASVs and identified 13 phyla, 30 classes, 56 orders, 104 families and 367 genera. The median sequencing coverage was 27,515 ASV per sample (min 14,382; max 74,538). In prospective study, we detected 9114 ASVs and identified 15 phyla, 37 classes, 60 orders, 97 families and 285 genera. The median sequencing coverage was 28,383 ASV per sample (min 14,382; max 55,923).

Determination of short-chain fatty acids in serum
SCFAs were analyzed in plasma by LC-MS according to a method described before (6) with minor modifications. Briefly, fifty microliters of a mixed standard solution containing 4 mM of formic acid and acetic acid, 2 mM of propionic acid, and 1 mM of each of the other six SCFAs were added to a 2 mL borosilicate test tube that contained 1 mg of 13 C6-3NPH HCl. Twenty-five microliters of 120 mM EDC-6% pyridine solution and twenty-five microliters 75% MeOH were then added to the mixture. The mixture was reacted at 4 °C for 4 hours. Twenty-five microliters quinic acid in MeOH was added and quenching proceeded for 45 min. After quenching, the mixture was transferred to a volumetric flask with 10% MeOH and diluted with the same solvent to 100 mL. This solution was used as the internal standard mix and was stored in aliquots at -20 °C. In total, 10 μl plasma was incubated with 60 μl 75% methanol, 10 μl 200 mM 3-NPH and 10 μl 120 mM EDC-6% pyridine at ambient temperature for 45 min with shaking. The reaction was quenched by addition of 10 μl of 200 mM quinic acid (15 min with shaking). The samples were centrifuged at 15 000 g for 5 min and the supernatant moved to a new tube. The samples were made up to 1 mL by 10% methanol in water and again centrifuged at 15 000 g for 5 min. In total, 100 μl of the derivatised ( 12 C 3NPH) sample was mixed with 100 μl of labelled ( 13 C 3NPH) internal standard. A mixed external standard solution containing 3,2 µM -0,63 nM of formic acid and acetic acid, 3,2 µM -0,31 nM of propionic acid, and 0,8 µM -0,16 nM of each of the other six SCFAs were always prepared fresh and used for each batch. Samples were analyzed by a 6500+ QTRAP triple-quadrupole mass spectrometer (AB Sciex, 11432 Stockholm, Sweden) which was equipped with an APCI source and operated in the negative-ion mode. Chromatographic separations were performed on a Phenomenex Kinetix Core-Shell C18 (2.1, 100 mm, 1.7 um 100 Å) UPLC column with SecurityGuard ULTRA Cartridges (C18 2.1mm ID) (changed at regular intervals at). The column was backflushed for 60 min between each batch to ensure good chromatographic separation. Water (100% solvent A) and acetonitrile (100% solvent B) was the mobile phases for gradient elution. The column flow rate was 0.4 mL/min and the column temperature was 40 °C, the autosampler was kept at 4 °C. LC starting conditions at 0.5% B, held for 3 min, 3 min 2.5% B ramping linearly to 17% B at 6 min, then to 45% B at 10 min and 55% B at 13 min. Followed by a flush (100% B) and recondition (0.5% B), total runtime 15 min. The MRM transitions were optimized for the analytes one by one by direct infusion of the derivatives containing 10 µM of each fatty acid, essentially as according to Han et al. (7). The Q1/Q3 pairs were used in the MRM scan mode to optimize the collision energies for each analyte, and the two most sensitive pairs per analyte were used for the subsequent analyses. The retention time window for the scheduled MRM was 1 min for each analyte. The two MRM transitions per analyte, the Q1/Q3 pair that showed the higher sensitivity was selected as the MRM transition for quantitation. The other transition acted as a qualifier for the purpose of verification of the identity of the molecule. UPLC/MRM-MS data was acquired in the "scheduled MRM" mode using the Analyst 1.5 software and data processing was performed using the MultiQuant 3.0.3 software (AB Sciex, 11432 Stockholm, Sweden). Standards for SCFAs used were: Formic Acid (C1) (Scharlau), acetic acid (C2) (Honeywell), propionic acid (C3) (Alfa Aesar), butyric acid (C4) (Sigma Aldrich), isobutyric acid (C4) (Alfa Aesar), Succinic acid (C4) (Acros), isovaleric acid (C5) (Sigma Aldrich), valeric acid (C5) (Alfa Aesar) and caproic acid (C6) (Sigma Aldrich). Analytical reagent-grade 3-nitrophenylhydrazine (3NPH)-HCl (97%), 2-nitrophenylhydrazine N-(3-dimethylaminopropyl)-N0-ethylcarbodiimide (EDC) HCl, quinic acid, HPLC grade pyridine and Lichrosol reagent grade MeOH and water was obtained from Sigma-Aldrich. Acetonitrile Optima LCMS Grade was obtained from Fisher scientific. 13 C6-3NPH-HCl was custom synthesized to us by IsoSciences Inc. (King of Prussia, PA, USA) (catalogue 13309). This custom-synthesized compound was structurally confirmed by 1H NMR spectroscopy and by MS/MS on a triple-quadruple mass spectrometer.

Volatile compounds analysis of feces
Stool was homogenized and diluted to equivalent of 1% (w/w) dry mass. This was pippeted into a 10 mL vial for headspace analysis, and prior sealing with a magnetic cap, 20 μl of sodium azide water solution (0.2%, w/v) was added as a bacteriostatic agent. Volatiles fingerprinting was performed using an Agilent 7890B gas chromatograph coupled to Leco Pegasus 4D time of flight mass spectrometer. The instrument was equipped with a multi-purpose autosampler (MPS, Gerstel, USA), performing heated incubation, steering, and volatiles collection onto a solid-phase microextraction fiber with a divinylbenzen/carboxen/ polydimethylsiloxan (DVB/CAR/PDMS 50/30 µm) coating from Supelco (USA).
The sample was incubated for 10 min and volatiles extracted onto a fibers stationary phase for 20 minutes, both at steering at temperature of 60 °C. Separation was performed on GC capillary column HP-Innowax (30 m × 0.25 mm i.d., 0.25 µm film thickness; Agilent Technologies, USA) with splitless injection at 265 °C. The GC oven temperature program was as follows: 40 °C for 1 min; then ramped at a rate of 10 °C/min to 180 °C; then at 20 °C/min to 260 °C and held for 6 min for a total GC run time of 25 min.
Time of flight mass spectrometer was operated with acquisition speed of 10 Hz to obtain full spectral information in a mass range 35-350 Da. Peak find,mass spectral deconvolution and subsequent peak alignment were performed in ChromaTOF software (LECO, USA). Compounds with a quantification mass signal to noise ratio (S/N), higher than 100 and present in more than 50 % of smallest sample class, were selected for alignment. For signals from different samples, to be listed in the aligned table as a single compound, retention time (maximal difference of 5 s) and spectral similarity at least 60 % must be met. n the aligned table, areas of quantification masses for each aligned compound, with tentative identificationwere provided. This tentative ID is based on spectral similarity of deconvoluted mass spectrum of signal and spectra in NIST 2017 mass spectral library. Further confirmation of signals identity was based on comparison of measured retention index and retention indexes in the NIST library. An aligned table was exported to Microsoft Excel, where constant sum normalization was performed. Thus each compounds quantification mass area was divided by sum of all signals quantification mass areas in respective sample.

NMR analysis
Serum samples were analyzed after protein precipitation. Aliquot of 220 µl serum sample was mixed with 440 µl cold methanol. The mixture was kept in freezer at -20 °C for 30 minutes and then centrifuged at 18 620 g for 10 minutes at 4 °C. The supernatant was transferred into fresh vial and vacuum dried. Evaporated supernatant was dissolved in 450 µl D2O with 50 µl 1.5 M phosphate buffer and 50 μl 0.1% TSP, and then transferred into 5mm NMR tube.
NMR data were acquired on a 600 MHz Bruker Avance III spectrometer (Bruker BioSpin, Rheinstetten, Germany) equipped with a 5mm TCI cryogenic probe head. All experiments were performed using Topspin 3.5 software at 300 K with automatic tuning and matching, shimming and adjusting 90° pulse length for each sample. Serum data were analyzed from Carr-Purcell-Meiboom-Gill (CPMG) spectra acquired by cpmgpr1d pulse sequence with following acquisition parameters; number of scans NS=192, spectral width SW=20 ppm, 64k of data points (TD), relaxation delay for water presaturation d1=4 s, echo time 0.3 ms, loop for T2 filter 126. J-resolved experiment (NS=2, SW=16, TD=8k, number of increments=40, SW=78.125 Hz in the indirect dimension, d1=2 s) was performed on each sample to facilitate metabolite identification. Additional heteronuclear single quantum correlation (HSQC) and total correlation spectroscopy (TOCSY) experiments were executed for selected samples.
Acquired data were processed with Topspin 3.5 software. CPMG spectra were line broadened (0.3 Hz), automatically phased, baseline corrected and referenced to the signal of TSP. The regions with signal of water and methanol were excluded and then spectra were normalized using probabilistic quotient normalization (PQN) method (8) to the pooled lean healthy group. Individual metabolites were identified using Chenomx software (Chenomx Inc., Edmonton, AB, Canada) and their proton and carbon data were then compared with the HMDB database (9). Metabolite concentrations were expressed as normalized intensities of corresponding signals in CPMG spectra.

Data analysis Identification of discriminating features between cohorts
The statistical analyses were performed using R software packages and in-house scripts (21). For individual tasks, the following R packages were used: composition (centered log-ratio transformation), zCompositions (zero multiplicative replacement) vegan (PERMANOVA), phyloseq (α-diversity), effsize (Cliff's delta), glmnet (LASSO logistic regression). Clinical characteristics of the observational sample were compared using standard tests. Prior to further analyses, all variables for which the sum of counts was below 0.01% of the total sum of all counts were removed from microbiome data and all variables for which the sum of AUC was below 0.01% of the total AUC were removed from fecal metabolome data. The microbiome and VOCs data were treated as compositional (proportions of total read count in each sample or proportion of the total area of selected masses), and before all statistical analyses, the data were transformed by centered log-ratio (clr) transformation with a multiplicative simple replacement for handling zero values. All data were scaled (z-score) before applying PERMANOVA, principal component analysis (PCA), or LASSO regression PCA was used to investigate possible sample clustering in each dataset. For each data type, multivariable statistics (PERMANOVA, 10000 permutations) were applied to test for differences between the groups. Univariable statistical analyses were performed by the Mann-Whitney-Wilcoxon test when testing two groups and the Kruskal-Wallis test when testing at least three groups. The results were adjusted for multiple hypothesis testing by the Benjamini-Hochberg procedure with cut-off levels at a false discovery rate equal to 0.10. We analyzed the discriminatory power of each omics dataset using machine learning; specifically, we used logistic regression with L1 penalization (i.e., LASSO) with ten times repeated 10-fold cross-validation. To handle imbalanced groups, each sample was assigned a weight inversely proportional to the number of samples within the respective group. The validity of a model was verified using a permutation test with 50 repetitions. Correlation networks based on Spearman's correlation coefficient were used to assess the correlation between the studied variables.

Classification into patients groups using machine learning
To classify patients into groups, we employed machine learning approach called Logistic regression with L1 penalization (i.e., Lasso) with 10 times repeated 10-fold cross-validation. To handle imbalanced groups, each sample have a weight inversely proportional to the number of samples within that group. The validity of each model was verified using permutation test with 50 repetitions. Lambda parameter was obtained by cross-validation using glmnet package and the miss-classification error was used as a loss function. All the data were scaled (z-score) before learning the models.

Predictive signatures in inulin interventions
The effect of each possible predictor variable on the clinical outcomes was assessed by linear regression where: x, name the predictor variable; y, name of the outcome variable; , value of a predictor x of a patient p in time A; , value of a predictor x of a patient p in time B; , value of an outcome y of a patient p in time A; βx, βy coefficients given by the model ; z(), z-score standardization function; error term. That is, we fit a model to predict the difference in the clinical outcome after intervention using individual predictors before intervention, confounding on the clinical value before intervention. Along with individual p-values for the variables, we report the resampled R 2 obtained using bootstrapping (50 iterations). We omitted variables with significant coefficients having high leverage observations. Such variables were identified by analyzing the graph of the univariable linear regression.